Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: dropping nuisance columns in DataFrameGroupby apply, agg, transform #41475

Merged
merged 9 commits into from
May 26, 2021

Conversation

jbrockmendel
Copy link
Member

Discussed on this week's call.

@jbrockmendel jbrockmendel added Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels May 14, 2021
@@ -647,6 +647,7 @@ Deprecations
- Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`)
- Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`)
- Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`)
- Deprecated ignoring "nuisance columns" in :meth:`DataFrameGroupBy.agg`, :meth:`DataFrameGroupBy.apply`, and :meth:`DataFrameGroupBy.transform`; select valid columns before calling the method (:issue:`41475`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a sub-section (ok to lump this in with the DataFrame reductions PR sub-section as well), e.g. just a hey we are now deprecating both of these simliar things, with an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. #41480

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated + green

@jreback jreback added this to the 1.3 milestone May 17, 2021
@jreback jreback added the Deprecate Functionality to remove in pandas label May 17, 2021
@jreback jreback merged commit f373bba into pandas-dev:master May 26, 2021
@jbrockmendel jbrockmendel deleted the depr-ignore-gb branch May 26, 2021 04:37
TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
@rhshadrach
Copy link
Member

In groupby._transform, I think we can always use _wrap_transform_fast_result in the case of a string aggregator instead of falling back to _transform_general for certain cases. The only thing stopping this now is it removes the warnings added here. Digging into it, I'm finding some inconsistencies:

df = DataFrame(
    {
        "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
        "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
        "C": np.random.randn(8),
        "D": np.random.randn(8),
    }
)
df.groupby("A").mean()
df.groupby("A").agg("mean")
df.groupby("A").agg(np.mean)
df.groupby("A").transform("mean")  # <--- warns
df.groupby("A").transform(np.mean)  # <--- warns
df.groupby("A").apply("mean")
df.groupby("A").apply(np.mean)  # <--- warns

Should all, just the np.mean, or none of these warn? From the whatsnew note, it seemed to me that all of these should warn.

@jbrockmendel
Copy link
Member Author

Should all, just the np.mean, or none of these warn? From the whatsnew note, it seemed to me that all of these should warn.

I think you're right. IRC getting consistency across all these cases was something we punted on.

@rhshadrach
Copy link
Member

Thanks, I'll take a deeper look into emitting the warning in the right places.

@rhshadrach
Copy link
Member

rhshadrach commented Feb 19, 2022

I missed that the default of numeric_only is still currently True, so many of the above cases are currently correct. With #46072, all of the .mean() and "mean" forms will warn.

The np.mean should warn in all cases, but doesn't for agg. I plan to be putting up a PR for this.
I'm no longer certain on this. agg intercepts np.mean and replaces it with 'mean', which is why it doesn't warn. transform tries to do the same, but ends up having to call transform_item_by_item instead. apply also intercepts and replaces with 'mean', but then uses DataFrame._reduce which warns.

@rhshadrach rhshadrach mentioned this pull request Jun 13, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dropping nuisance columns in groupby is a nuisance
3 participants